We’re Doing It All Wrong 



Why the I.T. industry has so much 
trouble delivering quality software 



...and how to do it right 
by George D. Kobak 


We're Doing It All Wrong 


Page 1 of 30 







We’re Doing It All Wrong 

■ Why the I.T. industry has so much trouble delivering quality software.. .and how to do it right 
Copyright 2004 by George D. Kobak 


Contents 

Preface 

Part I 

Introduction 

■ Why we must be doing it all wrong 

1. How We Do It Is Wrong 

■ Code-centric vs. data-centric software development 

2. Examples From The Business World 

■ The content of a business letter is more important than the grammatical rules 


Part II 

3. When We Do It Is Wrong 

■ The 1:10:100:1000 rule of software errors 

4. Pay Me Now To Change Your Oil Or Pay Me Later To Change Your Engine... 

■ Why doing it right the first time (no matter how long it takes) is the most cost-effective in the long 
run 


Part III 

5. It’s Just Data 

■ Information can be represented by organized data having an inherent logical data model which 
must be adhered to when processing such data 

■ Data rules as king in the digital world but from a logical standpoint 

6. Working With Information 

■ The key is in the data modeling by using complex pattern recognition 

7. The Power Of Parameters & Prototypes 

■ Even exceptions have patterns that can be parameterized 

■ When in doubt, prototype 

8. Simple But Powerful 

■ The complex part is keeping it simple while making it powerful 

9. If It Can Be Conceptualized, It Can Be Digitized 

■ The digitization of information is just another medium for the expression of ideas and concepts 

10. An Optimal Architecture For Digitally Processing Data 

■ Logical and geometrically Riemannian in structure 

■ Dynamic in nature 

■ Linear in results 

■ Spherical in conceptualization and functionality 


Epilogue 

■ Keep the software soft 


We're Doing It All Wrong 


Page 2 of 30 



Preface 


This book just had to be written, like the artist who says, “I didn’t ask to paint; I just had to.” That is how I 
came to write this book. The concept first started out humorously during discussions with certain technical 
peers regarding software development. But as I put the thoughts to writing, and a loose structure of ideas 
began to mold into clearly stated prose, the impetus to write about why quality software is not being 
produced and how it can be done correctly continued to grow in a self-perpetuating manner. The more I 
saw flaws around me in the I.T. industry, the more evident the principles became for how to do it right. It 
is hoped that from reading this book, the reader will be able to grasp these principles and understand them 
to the point of not only applying them in whatever software development endeavor they may become 
involved in, but also to be able to explain such principles to others. It is also hoped that the book will be 
enjoyable reading since discussing technical subjects can be so dull for some people. 

Grateful acknowledgement goes out to David E. Burns who discovered the data-centric way well before I 
did and mentored me on this approach. He contributed greatly to the ideas in this book as well as helping 
to editorially review its contents. I also wish to thank Brian E. Christy for his great assistance in providing 
feedback and confirmation from a business standpoint as well as editorial contributions. An extreme help 
in confirming my initial findings was David McGoveran who has been advocating the data-centric 
approach for literally decades. And finally, I wish to thank my wife, Rita, for her special patience and 
support during the preparation of this book. 


We're Doing It All Wrong 


Page 3 of 30 



Part I 


Introduction 

■ Why we must be doing it all wrong 


The title of this book implies two things: 1) the whole approach to software development generally taken 
in the I.T. industry is wrong, and 2) there is a real reason why such an approach is all wrong. These are tall 
statements that obviously merit explanation and justification. But when you think about it, imagine if the 
automotive industry were to produce cars at the level of quality you generally see in software today. You 
eagerly drive your brand new luxury automobile out of the dealer’s lot, but 10 days later you get a phone 
call that there is a piece to the engine that needs changing, and could you come in to get it installed. 
Meanwhile, your engine has just stalled at a traffic light because the road you’re on is no longer drivable by 
the particular car you have. Point taken, you say, but software is different than cars. But is it really? 

Software needs an engine to drive it just as an automobile needs an engine to drive it. And there must be an 
interaction between the automobile and the driver (e.g. seat, steering wheel, etc.); so likewise, there must be 
an interaction or "interface” between the software and the user of that software. The desired results are the 
same, namely, safe driving. Do we not say that software “crashes” when it suddenly stops working? Do 
we now need software insurance just like car insurance in case of such “crashes”? And will the software 
insurance premiums be dependent upon what software application is used just as the model and make of a 
vehicle determines to a great extent the insurance cost? 

These parallels (between the automotive and computer software industries) emphasize the point that quality 
is a universal principle, carrying the same impact wherever applied or not applied. One can argue, of 
course, that software is one of the most sophisticated of products used by mankind today, and that software 
quality can be much harder to obtain than quality for real “nuts and bolts” commodities, but that doesn’t 
change the impact that good or bad quality makes on software. In fact, as more and more technology 
creeps into the automotive industry itself, quality issues converge. For instance, many models of 
automobiles produced today are controlled by on-board computers running software. Additionally, the list 
of engineering disciplines required to even produce an automobile nowadays is quite formidable (e.g. 
electronics, mechanical engineering, thermal dynamics, hydraulics, ergonomics, optics, chemical 
engineering, etc.). 

Software problems have become tolerated and even expected by software users nowadays (e.g. “My 
computer is a bit slow right now; could you wait a minute, sir, before I can access your account...”). 

Quality has become somewhat elusive to software development, especially when change occurs (e.g. “The 
system has had problems ever since that last fix went in...”). And to make matters worse, there is a 
domino effect with software; one misstep in the software application can cause a cascading number of 
problems, whether immediately noticeable or not. Now we come back to the title of this book and ask: 

Does it always have to be that way? Could there be something wrong with the approach itself that is taken 
to produce software? And if so, is there another approach that would greatly enhance the possibility of 
achieving quality? The answers to these questions are what this book is all about. 


We're Doing It All Wrong 


Page 4 of 30 



1. How We Do It Is Wrong 

■ Code-centric vs. data-centric software development 


Software developers often view their skills set as one of experience in this or that computer language (e.g. 
Java, C++, etc.). Database administrators (DBAs) will classify their skills based on the database 
technology that they may specialize in (e.g. Oracle, SQL Server, etc.). And on and on the list goes. The 
I.T. industry has been fragmented into many different specialties, with the assumption that these specialties 
will all tie in somehow for the benefit of the software user. When software is developed, it is like a baton 
in a relay race that gets continually passed from runner to runner. The problem is that with software 
development, the baton is often dropped. 

When this occurs, finger pointing often starts with each party at stake in the SDLC (software development 
life cycle) defending their particular role in “carrying the baton.” For a baton to be passed efficiently in a 
relay race depends upon two position factors: 1) the position of the baton, and 2) the position of the 
runners that will be carrying the baton. Simply put, the two have to mesh. Now for software to be passed 
efficiently through the SDLC, two position factors also apply: 1) the position of the software from a logical 
standpoint, and 2) the position of the environment that carries the software. Again, simply put, the two 
have to mesh. 

This step-by-step adherence to quality is evident in the automotive industry, with such models as TQM 
(Total Quality Management) where quality is monitored and measured from points A to Z. But if we look 
at the SDLC as it is generally utilized today, we notice that quality is often addressed formally only in a QA 
(Quality Assurance) department of some kind if existing at all, and that occurring only after the software 
has been initially developed. QA then serves almost as a "last resort” attempt to maintain some perceivable 
level of quality in the software produced. Although often implied to be, quality is not really inherent 
throughout the SDLC in the sense of it being the focus for every step from A to Z as in the automotive 
industry. 

We have already mentioned that the I.T. industry has been fragmented into many different specialties, but 
the same could be said about the automotive industry. However, something different happens when you try 
to produce a product of ideas (e.g. software) in the same way you would a product of tangible, “nuts and 
bolts” commodities (e.g. an automobile). It’s as if the rules of engagement keep insidiously changing the 
more the software gets worked on. It is not by accident that the word “soft” appears in the word 
“software.” It suggests fluidity, a flow that doesn’t adhere to the Industrial Age widget assembly line way 
of doing things. And as further elaborated on in Chapter 3, it ends up that even when we develop software 
is also wrong. It’s a matter of how we use the resource of time during the SDLC in an effort to incorporate 
quality into the software. All of this suggests that the whole approach to the SDLC may be wrong. But to 
really determine this, we first need to encapsulate in one word or two, what the essence of that wrong 
approach is. For this we can use the expression “code-centric.” 

The suffix in “code-centric” implies that the software code itself is central to the development of the 
software. On the surface, this would seem to make sense; after all, software is written as software code. 

On this, the parallels of a business letter and software are striking, as shown in Chapter 2. For instance, a 
business letter, like software, can only be written in some sort of language. In the case of a letter, it would 
be a natural language (e.g. English, Spanish, etc.). In the case of software, it would be a computer language 
(e.g. Java, C++, etc.). Whatever computer language is used for writing the software code, the whole 
premise of quality rests on how well that code performs when executed. So you would think that a code¬ 
centric approach, one of focusing attention on the code itself, would be the key to quality. 

Now let us consider some of the evidence that the I.T. industry does indeed focus mainly on the code 
whenever software is being developed. We need look no further than the title often used for the person 
writing the actual code, namely, the “software developer.” Just as a surgeon is viewed as the key player in 
a surgical operation being performed, so likewise, the software developer is considered the key player in 
producing the software. (Note: the anesthesiologist is really the one considered in charge of a surgical 
operation from a medical protocol standpoint, but we are using the analogy from a patient’s perspective). 
After all, no code means no software. True, the systems analyst or further back in the SDLC, the business 
analyst, play key roles for the software’s requirements, but when the software is tested and fails, it is 
generally handed back to the developer. The analyst may become involved if the developer feels that the 
software requirements may be out of kilter and thus contributing to the problem at hand. But more often 


We're Doing It All Wrong 


Page 5 of 30 



than not, the software developer, much like our surgeon, is tasked with fixing the “patient,” namely, the 
software code. Or the developer may face a faulty environment which the network administrator in charge 
of the firewall may need to address, much like the anesthesiologist may need to correct drug dosage (the 
“environment”). But once these problems are corrected, it is the surgeon that returns to the surgery, and 
likewise the software developer that returns to writing the code. A code-centric approach, is it not? 

This is not to say that the role of the software developer or the writing of code is not important. Remember 
what the word “code-centric” implies, namely, that the code is the central hub or main focus of attention in 
the SDLC. It is this centricity that we are calling into question in this book. But is there an alternative 
approach that is better? We think there is. 

It is interesting that there is a word which we can use to describe the behavior of software gone awry and 
which just happens to be the opposite in meaning to the suffix “centric.” Something or someone that is not 
behaving according to the norm or according to expectant behavior is labeled as being “eccentric.” 

Aberrant (read “eccentric”) behavior is what we often see in software nowadays. It’s off center to its 
intended purpose, despite the code-centric approach of centering attention on what makes up the software 
in the first place, namely, the code. This disconnect between the SDLC’s central focus on code and the 
sometimes lack of focus that occurs for the software’s intended purpose (once the software is actually used) 
implies that there must be something “lost in translation” as it were. The disconnect can be subtle or can be 
glaring but the principle is often the same. What is it that is “lost in translation”? 

Back to the illustration of the business letter, let’s say that it was composed in a natural language (e.g. 
English, Spanish, etc.) that was different than the mother tongue of the letter’s author, in other words, 
translated. Now if there is a disconnect between the focus on the words used in the translated business 
letter and the lack of proper focus by the reader on the intent of the letter in the sense of missing a key 
point, one would say that the letter was “lost in translation” since the meaning was misread (or 
mistranslated). So what does this missed meaning in the letter correspond to when it comes to software 
missing its intended purpose? 

Well, business letters and software have something else in common besides needing to be written in some 
type of language, in that they both work with information in one way or another. For instance, words 
represent meaning and serve as a bridge to ideas and concepts which can be construed as information. 
Software processes data that when organized properly, represents information also. We say "information” 
rather than “data” because there is a slight but significant difference. 

Data could be anything including meaningless, randomly placed textual characters or gibberish words. 

What happens though when the characters are organized into meaningful words and the words are 
organized in such a way as to represent meaningful ideas/concepts? We have a representation of 
information from such data organization. (Note: Information is often simply defined as being processed 
data, but for the sake of discussion here, we are differentiating information from data by referring to the 
abstract, conceptual sense of the word “information” vs. the actual, readable component pieces making up 
“data” that (when taken together) would represent such information.) Therefore, we could say that 
information is represented by data when the data is organized enough to be able to properly represent 
meaning. This point is crucial to the whole issue of where the focus of software development should be. 
Notice that in our discussion about information and data, we didn’t mention code at all as if we’re almost 
ignoring it. If we are then relegating software code to, at best, a secondary role, are we changing the focal 
point of our SDLC? Yes. But what, then, should be the central factor (if it’s not the code), the information 
or the data? 

Well, let us first look at the nature of the machine that would run the software, the computer itself. Now 
computers can sometimes give the impression of being thinking machines, but they’re not. Computers do 
not "think” in the sense of having cognition as humans do. They are just machines that process electrical 
signals. The results they give and the speed at which they give them may look like they’re really smart, but 
in fact, they’re as dumb as your coffee table; if they were really smart, they wouldn’t keep crashing the 
software, now would they? If the electrical signals that are processed by these computing machines 
represent either 1 or 0, they are called binary or digital computers. If quantum computing becomes 
possible, the machine processing used will still rely on some type of representative signal in its qubits 
(quantum bits), and these quantum computers will still be machines. It is these signals and the pattern 
thereof that would represent the data; the data, in turn, when properly organized, would represent the 


We're Doing It All Wrong 


Page 6 of 30 



information that is desired/needed by the user of the software. In this chain of links between the computer 
and the software user lies the answer to where our focus should be in the SDLC. 

Since information is abstract and only understood by the user and not the machine, software code that is 
written and processed by a computer cannot deal with information directly. However, since it is the data 
(when properly organized) that can represent such information, and since the data itself can be represented 
by the very signals that the machine processes directly, it is the data itself then that is key to our centricity 
question. But remember the comment made that there is a slight but significant difference between 
information and data? If we move our focus from the software code to the data itself, how do we ensure 
the tie-in between the data and the related but distinctly separate information component? 

Since information can be represented by organized data, the data then must be organized in a way that 
makes it usable in various ways while preserving its information-representing qualities. How do we do 
this? A hint can be found in the statement made earlier that the electrical signals (processed by the 
computer) and the pattern thereof represent the data. Now such a pattern can be modeled mathematically. 
In fact, mathematics is really the science of patterns, and since computers “understand” only 
mathematically (e.g. processing signals as ones and zeros), we simply need a process that models the 
pattern of electrical signals to represent the data we’ve organized. This process occurs for data on a storage 
device (e.g. a computer’s hard drive) when translated into ones and zeros by the computer’s processors. 
Incidentally, the same process occurs for the software code itself when it is compiled; the compiling 
program changes the software code into machine readable language which in turn allows it to be 
understood in its simplest form (ones and zeros) by the computer’s processors. 

Moving up the chain between the computer and the software user, we need to do the same thing in principle 
as done for the electrical signals and their pattern. Just as that pattern is modeled in a way that allows for 
the proper representation of the data, so likewise, we need to model the pattern of the data itself in a way 
that would allow the data to properly represent the information it’s supposed to stand for. This modeling 
entails three stages: conceptual, logical, and physical data modeling. Chapter 5 discusses this modeling in 
greater detail, but for now we’ll describe it briefly here. The conceptual step is the most abstract and is 
formed mostly in the mind. The logical modeling aspect translates the conception of the data architecture 
into a logical order or pattern of logical construction that the data would be organized into, perhaps using 
data modeling tools to represent such logical architecture. The physical modeling deals with the direct 
placement of the data as it relates to the computer itself and its internal components (e.g. a database). 

To illustrate this three-step data modeling, say you have three numbers in your mind, 1, 4, and another 1. 
You conceive the idea of using those numbers as a telephone number for obtaining other telephone 
numbers. You then organize the numbers into a logical order of 4, then 1, then the other 1 to produce a 
telephone number (411) that is meaningful for its intended purpose. (Note: We say, “logical order” in the 
sense of an orderly format instead of what would indicate logic or reason although logic in this sense (e.g. 2 
+ 2 = 4, so therefore 4-2 must be 2) may be involved in such logical data modeling.) This number/usage 
organization involves the three numbers themselves as well as the relationships between the three numbers 
such as their sequential order, their position relative to each other, their position relative to use (e.g. dialed 
on a telephone as 4, then 1, then 1 again), etc. The individual numbers themselves could be referred to as 
the data elements in the number set 411, and the various relationships between the individual numbers and 
their environment and between each other and even within the number set 411 can be referred to as the data 
relationships. The physical data model is simple: The number sequence of 411 is used or stored together 
as such in the applicable manner (used in dialing, stored in a telephone book, etc.). And this organization 
of data has a side benefit. The logic inherent in such organization can make the software coding less 
complex. 

Since the data, then, is the bridge between the information it represents and the patterned signals that 
represent such data and that the computer processes, it seems logical (correct reasoning) to place it at the 
center of our SDLC. A “data-centric” approach, is it not? 

As shown in later chapters, a data-centric approach to the SDLC is much more powerful and reliable than 
the commonly used code-centric approach. Remember the business letter example we used? The next 
chapter will discuss further how the business world provides us powerful examples of what works well and 
doesn’t work well when it comes to information, and how such examples apply when it comes to software 
development. 


We're Doing It All Wrong 


Page 7 of 30 



2. Examples From The Business World 

■ The content of a business letter is more important than the grammatical rules 


If we carefully examine what happens in the real world of business, we can garner many insights into what 
has proven to work and not work when it comes to dealing with information. This knowledge can then 
help us to see why a data-centric approach to software development would work much better than a code¬ 
centric approach. 

Let’s start with the very nature of information as it flows through the business world. To keep it in proper 
perspective, let’s also distinguish the two ways in which information is processed, namely, thought 
processing and electronic processing. Either processing will always require some form of information 
input and always give some form of output (if we consider getting no results from such processing as still a 
form of output nonetheless). 

In general, the flow of information starts with a want and/or a need. For instance, a business entity may 
want to provide one of its customers with marketing information for a new product. Or a need may arise 
for providing information to address a customer complaint. Let us use the latter case for a step-by-step 
example of information flow, using both thought processing and electronic processing. 

The information flow really starts when the customer voices the complaint to the business entity. 

Remember in Chapter 1 that we referred to the first stage of data modeling as being conceptual. This 
involves conceptualization of the information elements that flow through the processes the software will 
touch on. In the business world, it’s the same thing. So let’s do a little conceptualizing here. 

We have the customer, a complaint, and the voicing of that complaint. Right away, we have relationships 
for these information elements. The customer is the one who has the complaint. The customer wants to 
voice that complaint. The business is who the customer wants to voice the complaint to. These obvious 
relationships have yet other not-so-obvious relationships, such as the need for the complaint to be 
addressed by the business receiving the complaint; otherwise, why would the customer voice the 
complaint? This relationship in turn relates to another hidden relationship, namely, the need to determine if 
the complaint is legitimate in order to know how to address the complaint. 

We could really go on and on regarding such relationships, so it is necessary to have scope in our 
conceptual data modeling. So where do we draw the line? Well, we want to stick to the information flow 
itself and not be branching out too far away from the context of such flow. So it is important to model 
those information elements and relationships that would be relevant to the information flow itself. This 
relevancy, once modeled, gives us the big picture of what’s going on in our customer complaint process 
from a logical standpoint. Remember that some of the relationships we mentioned are not really concrete, 
physical business transactions per se, but are important nonetheless. 

In the case of the complaint written by hand, the conceptual data modeling is easy. The customer figures 
out what to write, puts it down on paper, and mails it out. His/her thoughts have been transferred on paper 
and the information flow for such actions are respected. The customer intuitively conducts the correct 
conceptual data modeling of such information flow by simply thinking on the matter while composing the 
letter on paper. 

So far, we’ve looked at the customer complaint from a thought processing perspective only. When the 
complaint is written down by hand in a letter for mailing, there is no electronic processing involved here. 
But if the complaint is written in an e-mail to be sent to the business e-mail address, then electronic 
processing is definitely involved, and it is here that the principles of good software development then 
apply. Now the parallels between thought processing and electronic processing are striking. They both 
depend upon the correct use of information flow and subsequently require correct conceptual data 
modeling. So we can learn a lot from examining the conceptual data modeling that intuitively occurs in the 
mind of a customer when writing out a complaint letter by hand. It will help us to better understand the 
need for focusing on data (e.g. through conceptual data modeling) instead of on software code when it 
comes to electronic processing and subsequently software quality. So let’s take a closer look at the thought 
processes involved when a complaint letter is written out by hand. 


We're Doing It All Wrong 


Page 8 of 30 



In the customer’s mind is where the concept of the complaint resides. The customer may have voiced it 
audibly to others but now wants to put it down in writing. At this point, there are probably not just 
thoughts but also emotions involved. The process of verbalizing on paper what is thought, felt, or talked 
about, will cause a reorganization of sorts in the mind in order to formulate words into sentences, sentences 
into paragraphs, and so on, until the letter is completed. The customer’s avid desire is to have the content 
of the letter reflect the thoughts, feelings, and possibly verbal commentary that is espoused. Notice that the 
central focus of the customer will be on the content, not the grammatical rules and syntax of the words to 
be used in the letter. The customer will carry on a conceptual data modeling of sorts when selecting words 
or combinations of words that will represent conceptually the information that is intended to be passed on 
to the reader. 

This parallels nicely to when the central focus is on the data itself instead of on the software code, its 
syntax, and so on, when it comes to software development. Of course, the code is important just like 
grammatical rules are important for a business letter. A grammatically correct business letter bespeaks 
professionalism and competence just like well-written software code indicates a polished programming 
style. But if the true intent of the letter is somehow missed or not completely understood by the reader to 
whom the letter is sent, grammatical perfection is not really helpful here. There was a disconnect between 
the words themselves (the data elements representing the conceptual, logical aspects of information) and 
the letter’s intent (the conceptual, logical aspects of information). 

So likewise, code that compiles or even runs well during software testing doesn’t always mean that it 
would provide exactly what the end user asked for. The difference lies in the information flow. Try to 
pinpoint any problem related to a misread letter or to malfunctioning software and it will ultimately come 
down to some point or points in the information flow that disconnected. 

So we see that conceptual data modeling to properly reflect information flow is crucial even when things 
are done by hand or in the mind. This principle is behind such simple expressions as “let me think first” or 
“1 didn’t think about that.” This same principle applies even more when it comes to software development 
since machines are not capable of cognitive thought like the human brain is. The machine will be relying 
completely on the software’s architecture and modeling that a human designer(s) has implemented. Even if 
the machine is claimed to be “self-correcting,” it will still rely upon an architecture and modeling for that 
self-correction process that was implemented by a human designer(s). If the information flow is not 
properly supported in this area, the computer will simply follow the software’s instructions faithfully but 
still be prone to fail. And the software will be much more limited in its range of applications, thus resulting 
in missed opportunities for its use. It is this vastly increased potential for harnessing computing power via 
electronic processing that is the ultimate benefit of a data-centric approach. The key is in having a proper 
connection (relationship) of information, data, and electronic processing as the next chapter will further 
elaborate on. There, the reader will be shown evidence that there is a flaw even in when (not just how) the 
SDLC gets implemented. 


We're Doing It All Wrong 


Page 9 of 30 



Part II 


3. When We Do It Is Wrong 

■ The 1:10:100:1000 rule of software errors 


The old adage “an ounce of prevention is worth a pound of cure’’ applies ideally to software development. 
The 1:10:10:1000 rule can literally manifest itself when a production problem is a thousand times more 
expensive to fix or causes a thousand times more loss than if the problem was nipped in the bud, as it were, 
at the very beginning of good software development, namely, in the design phase of the data-centric 
architecture. True, the possibilities can be nearly endless for the number of permutations of events that can 
occur in a complex software application. But if the design fundamentals are sound, and as shown in 
Chapter 8, the “simple but powerful” principle is applied, the high number of permutations doesn’t really 
have to become an unmanageable problem, if you get the relationships right. To illustrate, let’s use the 
analogy of a traffic jam and the potential for an accident. 

It doesn’t matter how many cars there are in the congested traffic; as long as each car in relation to another 
has enough maneuvering room, there will be no accident. Interestingly, when an accident does occur, the 
party at fault is often heard to say, “1 didn’t see the car coming” or “1 didn’t know it was there.” Otherwise, 
why would the driver want to hit another car? The driver’s comments alludes to a question of reference. 
The same applies to software. A key source of software problems comes from this: A reference was not 
seen, recognized, or known by the computer and/or end user, just like the driver did not see or know about 
the other car. 

For the end user, the reference may have been lost, changed, missed, misspelled, misapplied, forgotten or 
just non-existent. For the computer, the reference may likewise have been lost, changed, missed, 
misspelled, misapplied, or non-existent. Notice we didn’t include “forgotten” for the computer reference 
since computers are like elephants. They don’t forget (unless you erase the data). But it is in this need of 
reference by the computer that the logical aspects of data come into play big time. 

This point about correct reference is only one of three principles that are at the foundation of software 
“bugs” or problems that occur when the software doesn’t work as it should. The other two principles can 
be phrased as questions: Is it useful to the end user? Does the software follow logic? In a way, the first 
question is paramount. If software is not useful to the end user, what’s the point? No matter how well it 
may have appeared to be designed, this usability factor cannot be ignored; otherwise, the software will just 
become shelfware, actually a waste of money. This factor depends keenly on getting the requirements right 
if it is for customized software, or conducting an honest evaluation if it is for off-the-shelf software. But 
usability issues may still arise even if these criteria are met. 

You may have all the functionality you need for your customized or off-the-shelf software application, but 
if it keeps crashing, it’s really not very usable. This brings us to that second question on software issues, 
does the software follow logic? Well, if the code compiles it must be okay, right? Not necessarily. If the 
compiled code ends up at a dead end in its execution or if it has fixated itself into an infinite loop, it’s going 
to give the impression of being suspended, hence the expression "the software’s hanging” or "the screen is 
frozen.” So we ask again, does the software follow logic? 

But why would we even question whether the software follows logic? After all, when software is 
compiled, it is rewritten in machine language so that the code’s instructions can be executed by the 
computer’s processor. That execution will occur in a logical manner as long as the laws of physics for 
electromagnetism hold. A pretty reliable logic flow for the code, don’t you think? So how can we question 
the software’s execution when it comes to logic? 

As hinted in previous chapters, there is a subtle logical flow of information that occurs when software is 
run. Remember, the software is processing data that only represents information. Technically speaking, 
when software is run on a computer, the computer doesn’t really process data at all (e.g. words, characters, 
etc.), but rather the representation of the data itself, namely the electronic signaling that occurs when 
electricity passes through the various computer components, inducing a voltage flow for magnetizing bits 
(“on” state; value = 1) or demagnetizing bits (“off’ state; value = 0). The same principle applies when the 
computer is reading or writing to a CD (compact disc); although optical in nature (vs. magnetic), the 
read/write head on the CD would distinguish the same on/off states but based on optical perception, then 


We're Doing It All Wrong 


Page 10 of 30 



translate that pattern accordingly to the appropriate magnetic storage devices (e.g. to an I/O buffer, then to 
the hard drive) as magnetized or demagnetized bits. 

Notice we distinguish the data from the electronic signaling that would represent that data. But we also 
distinguish the information that the data itself would represent. As mentioned in Chapter 1, both words are 
often used synonymously in reference works. But the differentiation we’re making here is important in this 
book’s discussion. So to repeat from Chapter 1, there is “information” in the abstract, conceptual sense of 
the word; then there is “data” as the actual, readable component pieces that, when taken together, represent 
such information. To illustrate, let’s look at how "information” flows in natural language (e.g. English, 
Chinese, etc.). 

In natural language, information is communicated by one means or another through representative symbols. 
Some natural languages are less symbolic in their script (e.g. English alphabet) than other highly symbolic 
languages are (e.g. Chinese characters, Egyptian hieroglyphics) but are symbolic nonetheless. When you 
write the word “pen” on a piece of paper, the pen you’re referring to is not literally transferred onto the 
paper itself. To split hairs again here, the word is placed there really as a symbolic representation of the 
concept of the writing instrument we refer to as a pen. This representation must be modeled correctly 
across the board, all the way to the electronic signals that a computer would use to electronically write the 
word “pen” in a document on its hard drive. 

To further split hairs on why we say that the word “pen” is a representation of the concept of the actual 
writing instrument rather than the instrument itself, let’s illustrate it this way: A check written for a million 
dollars that clears at the bank may be just a piece of paper, but that piece of paper proved to be a true 
representation of the actual value of a million dollars. On the other hand, the words on the check that read 
“million dollars” represent only the concept of an actual million dollars, not the monetary value itself like 
the check does. That’s why this book uses the word "information” in the abstract, conceptual sense of the 
word. 

It could be said that “data” is conceptual too, but for all intents and purposes, we use that word to refer to 
the actual, readable component pieces (e.g. words, numbers, special characters, etc.) that when taken 
together can represent such information. Remember that we are discussing information flow as it relates to 
the question of whether the software follows logic or not. This logic is from a conceptual sense, since as 
shown earlier, the compiled code will follow literal logic paths just as surely as the laws of physics dictate 
that electrons will swirl around an atom’s nucleus. 

Again, to illustrate: Let’s say that a snippet (small piece) of software code was written for the purpose of 
looping through all the entries in an electronic bank statement to check for accuracy. But on execution, the 
snippet of code gets stuck in an infinite loop of endless checking. It was simply complying to what it was 
asked to do. Let’s see what happened. The loop was perhaps designed to run this way: Continue repeating 
the execution of the designated snippet of code until reaching the end of the bank statement. There is an 
assumption that the code is processing one entry at a time and moving from one entry to the next. What 
could have gone wrong? Well, for one thing, let’s say that the software code did not include the instruction 
to actually move to the next entry in the bank statement. 

In real life, if you were checking each entry in a paper bank statement, it is logical that you would move 
from one entry to the next. You would not keep checking the same entry over and over again ad infinitum; 
otherwise, you would never finish verifying the bank statement. It is this misstep in the software code that 
lead to the infinite looping. In this way, we can say that the software code was not following logic, that is, 
logic in a conceptual sense. 

You may wonder, why all this discussion about the three principles of software problems when the chapter 
title implies that it is when (not just how) we (the I.T. industry) do it (software development) that is also 
wrong? It is because to address software quality properly, you have to know and understand what the 
fundamentals of software problems are in order to know and understand when to address those problems. 

So let us now discuss when software problems should be addressed in the SDLC. As a hint, let’s go back to 
the analogy of the word “pen” being a representation of the concept of the actual writing instrument we call 
a pen. Remember it was said that that representation needs to be modeled correctly. In this modeling, 
everything can be said to have a relationship, even when there is no relationship at all between two entities. 
How so? Well, the nature or property of such a relationship is simply one of non-existence, a zero or “null” 


We're Doing It All Wrong 


Page 11 of 30 



relationship in mathematical or logical terms, and thus modeled that way. But to model these relationships 
takes time. 

This is one resource that the SDLC suffers the most from not having enough of in the I.T. industry today. 
So, to introduce the idea of taking added time to do something is anathema to software project managers. 
But while trying not to denigrate the project management profession here, nevertheless, it must be stated 
that a project manager may become nothing more than a glorified paper shuffler in the SDLC if this point is 
ignored. How so? Like they say in the stock market, it’s all a matter of timing. The stock market is a very 
good analogy to this, because it also brings into play the concept of investing. And that’s exactly what is 
being done when you take the resource of time (money) to invest in the SDLC (the stock market) right up 
front in the data-centric architecture design stage (a start-up firm before its IPO) to cash in later (cashing in 
on the IPO) due to having higher quality software (a reputable start-up firm) going to the end user (going 
public). 

But wait a minute, you may say. If we take more time in the SDLC to carefully model these relationships, 
this just adds costs, project release delays, etc. On the surface, this may appear to be the case since it 
definitely takes a considerable amount of time to perform conceptual, logical, and physical data modeling 
for the information flow that the project’s software will be involved with. And since it’s the data that 
represents the information in that flow, we want the data modeling to fit as close as possible to the 
conceptual, abstract, logical aspects of that flow. In fact, the type of data modeling we’re talking about 
goes even deeper. 

Since we’re looking for relationships in the data that will represent some sort of information, there has to be 
relationships in the information, too. So why not model the information flow itself? We can include the 
processes inherent in the flow, but not just the physical transactions and events in those processes. Let’s 
add the logical transactions and events as well, from both the technical and business sides. Logical 
business transactions, you say? That’s right. Remember, we’re talking about the abstract, conceptual sense 
of information in our modeling of its flow. This information modeling is the conceptual part of our data 
modeling; that’s why it goes so deep, in fact, much deeper than the typical data modeling carried out in the 
SDLC nowadays. 

We can just see the budget analyst breaking pencils over the cost of all this extra work. That is why it is a 
must to look at the whole picture, to understand the true costs of software quality or lack of software 
quality for that matter, before considering this data-centric, logical layered approach to software 
development. But it really just boils down to doing it right the first time, although that first time may take a 
long time. In our analogy of the traffic jam and the potential for an accident, if the driver that got into an 
accident was able to see it coming beforehand, the accident would have been avoided. All the extra checks 
that the driver had made beforehand in order to prevent the accident from happening would have paid off. 
It’s the same with data-centric software development. The next chapter will elaborate further on why this 
is really the way to go if you truly want quality software development that is cost-efficient at the same time 
(believe it or not). 


We're Doing It All Wrong 


Page 12 of 30 



4. Pay Me Now To Change Your Oil Or Pay Me Later To Change Your Engine... 

■ Why doing it right the first time (no matter how long it takes) is the most cost-effective in the long 
run 


To test this claim that doing it right the first time is the best for quality software, let’s work with an 
illustration again. It’s obvious that if you never change the oil in your vehicle, you may save some money 
but will eventually damage your engine big time. Now few people will balk at the idea of changing the oil 
in their car on a regular basis, but let’s think about that idea from a software perspective. We’ve already 
used the analogy of an automobile having an engine and a user “interface” just like software does. Now 
let’s focus on the timing of changing the engine oil compared with the timing of data modeling (and all that 
it encompasses) in the SDLC. 

First we start with the initial adding of the engine oil when the engine is first produced and tested in the 
factory. Obviously, without oil at that point the engine would be ruined immediately. Now let’s move the 
scene to when a software project normally begins in the SDLC. We say, normally, in the sense of how the 
I.T. industry does it nowadays. Remember we referred earlier in Chapter 2 to the flow of information 
starting with a want and/or a need. This is how software projects are born. To address that want or need, 
requirements are assessed and gathered by a person or team of persons who purportedly understand(s) the 
business side enough to concretize that want/need (e.g. a business systems analyst). And what is typically 
defined in a software requirements document? You got it, individual steps in the information flow. Now 
remember the connection we made between the data that a computer will process and the information that 
that data represents. In the logical realm of things, all you need is one misstep to throw off the process. A 
requirements flaw can have huge implications down the road. 

Back to the example of the timing of the engine oil change, the oil itself is often referred to as the life blood 
of the engine. So we agree it’s very critical. If the oil is gummy, dirty, full of sludge, and increasing in 
viscosity, its original purpose starts to weaken. What do you do with such oil? You change it. When do 
you change it? As soon as possible. 

If software is hastily designed, having missing requirements, sluggish in performance, and full of hidden 
errors even after compiling, its original purpose weakens too. What do you do with such software? You 
change it. When do you change it? As soon as possible. That’s what normally happens in the SDLC. But 
now we’re back to the original seesaw cycle of where software is passed between the developer and the QA 
department testing the software in order to correct any found “bugs.” We discussed that a data-centric 
approach is to prevent this. How so? Well, in our engine oil illustration, what if you changed the oil before 
it got gummy, dirty, full of sludge, and increasing in viscosity? You would preserve its original purpose, 
would you not? The same goes for software. 

In a data-centric, logical layered approach, we address information flow from as many angles as feasibly 
possible. We keep “changing the oil” until we get the flow just right. And if the original purpose changes 
a bit later on (e.g. end user requirements change), we change the model again if necessary. In other words, 
we keep the “oil” for our software engine in preen shape. We don’t wait for the sludge (“bugs”) to start 
piling up. But isn’t it cheaper to just develop and test the software rather than making all this fuss about 
information flow, etc.? Not really, and here’s why. 

Software should truly be soft, fluid, open to change, and not rigid in nature. It can be robust (solidly 
functioning) but it is not like hardware. A software application is not a piece of rock. By design, it should 
be flexible to accommodate the needs of the end user. If the architecture is too rigid, you’re going to have 
hardening of the software arteries. This need for fluidity is key to understanding all the fuss being made in 
this chapter over modeling information flow. After all, software only deals with data, more specifically, 
the data that represents the information in that flow. It’s not going to hold your coffee cup (although the 
CD drive on your PC may be mistaken as one). Now how rigid is information flow in the business world? 

Say you carried on a conversation with someone, then spoke again with that same person later on. If in the 
second conversation, you said the exact same words as before, you would sound like a machine, a robot, an 
automaton. That’s why AVR (automated voice recognition) can sound dull after a while. It’s the same 
sentences over and over again. Mind you, when computers process data, it is often similar to previously 
processed data just like a conversation may be similar to one carried on with someone at an earlier date. 


We're Doing It All Wrong 


Page 13 of 30 



But inevitably, even in the digital world of computers, data changes, so the software must be “soft” enough 
to be able to handle that change. 

It is in this ability to handle change that lies the cost justification of doing things right the first time in 
software development, namely, a data-centric, logical layered approach, especially at the concept and 
design stage. You see, when the software is designed to accommodate change, you’re really preparing the 
software for its inevitable purpose, namely, to process data that will be changing. Again, we need not 
worry about all the possible permutations that may exist for the steps that the software may take to execute 
its assigned tasks. For proper architectural design that takes into account this ever changing aspect of data, 
we need to establish the pattern that a process will take. There’s that word “pattern” again that was 
discussed in Chapter 1. Patterns can be modeled, thus the need to model our data and the interconnecting 
relationships thereof. 

In such modeling, there are several tugs of war that can occur. One is between efficiency and security. 
Another is between customization and modularization. Another is between quality and speed. The 
necessary balance needs to be struck in order to get the best of all worlds. This balance requires not only 
keeping the big picture constantly in mind (in order to fit all the component pieces in correctly) but also 
being able to understand the detail (the component pieces themselves). Unfortunately, this is becoming a 
lost art. 

One reason for this is that technology has become so complex nowadays that it is often impossible for any 
one person to cover all the bases. Another reason is that the I.T. industry has been fragmented out into 
many different specialties of expertise and segments of operation, as explained in Chapter 1. So high-level 
architecture is done in a generalist fashion, where the person or persons involved in the design of the big 
picture understands generally but not to any great detail the component pieces. Therefore, this problem of 
excessive complexity in technology needs to be addressed also. And this is what we do in Chapter 8 using 
the “simple but powerful” principle. On that premise (that the big picture and the details can be known and 
combined correctly), we can strike a balance for obtaining an optimal software architecture. 

Now how does all this not only up-front but also in-depth data modeling (including modeling of the 
information flow) create a cost-effective system as suggested in the title of this chapter? Well, remember 
we referred to the crucial aspect of timing as in the engine oil change example. Various schools of thought 
are now recognizing the importance of this timing issue in project management. 

For instance, the critical path method used for project scheduling rarely works accurately if at all. How 
many times have projects been considered late or incompletely done at software release time? It is a 
common occurrence. The main cause is easily recognizable if you interview the project participants. “We 
didn’t have enough time to...” is the common response. Why not enough time? Because it was scheduled 
wrong. Being that information flow is so dynamic by nature, how can you predict the exact amount of time 
needed to solve the problems inherent in developing software that will work in that information flow? 

If the software is to be fluid, your timing and scheduling needs to be fluid, too. And such approach is 
proving successful, as ably demonstrated by the project management method called critical chain 
scheduling and buffer management. It is not the scope of this book to discuss such project management 
methods, but the point to be made here is that whatever means you use to conduct the SDLC for your 
software development, there has to be enough time allowed to do the up front design right the first time, 
even if it takes longer than expected. 

Now we get to the point of why such proper timing actually saves money despite having to use more 
resources up front than expected. If you remember in the previous chapter that it was shown a software 
issue can cost a thousand times more in production than it would if cost was allocated to address the issue 
at the very beginning of the SDLC (the 1:10:100:1000 rule). You see, the data-centric, logical layered 
approach advocated here forces the addressing of that fluid nature of information flow. The modeling 
requires thought experiments to address “what if...” scenarios at the conceptual stage. A latticework of 
sorts is constructed so as to accommodate change. This accommodation is key to saving costs in the long 
term. 

We say “long term” because if the software being produced is going to be used for only a short time (which 
is rarely the case), perhaps its level of quality may not be so crucial, unless of course you’re talking about 
mission-critical applications. But generally, software is designed to be used over and over again. In this 


We're Doing It All Wrong 


Page 14 of 30 



case though, the information flow will often be fluid, thus necessitating a dynamic approach to how 
software will deal with the data representing that information. The more that information flow changes, the 
more the cost saving is incurred, because the software itself will by its very nature, break less, need to be 
changed less, and fail less, since such change support was designed into it from the beginning. 

Of course, there will always be a limit as to how much time you can spend on the architecture (otherwise, 
you could never get the software released). But it is hoped that the reader sees the importance of timing, 
even from a cost perspective. Thus we say like the mechanic, pay us now to change the oil (design the 
software correctly up front even if it takes longer than expected) or pay us later to change the engine (wait 
until major problems hit later because the initial design was bad or wait until major changes come later 
entailing costly changes to the software because the initial design was unaccommodating). But what if 
someone were to say we already spend lots of time on the software architecture, we already do extensive 
data modeling in their SDLC, we are keeping in mind the data when preparing software requirements, so 
we do “nip it in the bud’’ already when it comes to software issues? 

The expression "nip it in the bud’’ comes from the habit of horticulturists snipping off buds in order to get a 
better yield of fruit. Well, the data-centric approach we’re talking about here not only nips software 
problems in the bud in order to get a better yield of software quality but also grows a bigger software “fruit 
tree.” Recall the statement made in Chapter 2: “It is this vastly increased potential for harnessing 
computing power via electronic processing that is the ultimate benefit of a data-centric approach.” No 
doubt you, the reader, will concur that this would be a distinct and compelling advantage. But perhaps 
you’re now wondering how exactly such an approach would be undertaken. Well, let’s begin by taking a 
closer look at its central focal point, namely, the data, as well as the information that the data would be 
representing. 


We're Doing It All Wrong 


Page 15 of 30 



5. It’s Just Data 

■ Information can be represented by organized data having an inherent logical data model which 
must be adhered to when processing such data 

■ Data rules as king in the digital world but from a logical standpoint 


The previous chapters dealt extensively with what is wrong in the I.T. industry today when it comes to how 
software is produced. The finger was pointed at the code-centric approach taken in the SDLC. It was also 
argued and explained why a data-centric, logical layered approach is far better. Although touched upon in 
principle, we will now discuss in detail how such an approach would be implemented. 

Notice in the title of this chapter, there are two subtitle phrases given. Let us consider the first phrase: 
Information can be represented by organized data having an inherent logical data model which must be 
adhered to when processing such data. This is a profound statement. It is profound because it summarizes 
the tie-in between the two components involved in information flow, namely, the abstract, conceptual 
component we refer to as “information” and its representative component that we call “data.” 

As mentioned earlier in this book, some reference works define information as data that is processed. The 
“data” that we are referring to here is data in its “raw” or unprocessed state. Nevertheless, we also make a 
distinction between processed data and information in that when we refer to the word “information,” we do 
so in the abstract, conceptual sense of the word. With that distinction clearly in mind, let us now expound 
on the tie-in aspect of these two components (information and data). 

To reiterate, the first subtitle phrase for this chapter begins with “Information can be represented by 
organized data...” That tie-in applies to any format the data may take as long as the representation of the 
information can occur. For instance, the data could be in digital, audio, visual, mathematical or even 
natural language format (e.g. unstructured text). Digital format is the way in which data is processed or 
stored on computers as based on values of 1 or 0. Data in the other formats mentioned (audio, visual, 
mathematical, natural language) may also be in digital format at the same time. So when we add, not just 
digital but other formats as well to our list of formats that data can take, this means we’re including data 
that can be in audio, visual, mathematical, or natural language format but at the same time not be in digital 
format. Here are a few examples: 

You’re at a music concert listening to your favorite piece of music being played. The concert happens to 
be in a park, and you’re listening to it live. No speakers, no sound system, no microphones, nothing that 
could possibly be construed as a digital medium. But you distinctly recognize the music. The flow of 
information (the conceptual nature of a musical piece that a human mind would recognize) is being 
represented by data in an audio but non-digital format. What are the actual component pieces that make up 
this data? It’s the sound waves themselves that emanate from the musical instruments being played and 
that travel through the air to the human ear. 

The pattern of sound waves for the musical piece imply a model, a data model of sorts since we’re labeling 
the sound waves as data. What distinguishes these sound waves from those of just plain noise? The waves 
are highly organized to the point of reflecting the harmony and melody inherent in the musical piece. Even 
if the musical arrangement was in a contrapuntal setting instead of a chordal setting, the point and 
counterpoint of melodies would still have recognizable patterns of musical notes. This pattern of 
organization, this logical, orderly construct in essence, allows us to describe the musical arrangement’s data 
model as being logical in nature. This logical data model must be adhered to during the playing of the 
musical instruments if we want the musical piece to be played as intended by the composer. And this is 
what the concept of being “in tune” means from the perspective we are discussing here. Sum it up and we 
have this: Sound (e.g. musical) information can be represented by organized data in audio format having 
an inherent logical data model which must be adhered to when processing such data. 

Now let’s look at an example of data that is in visual format but not in digital format. Quite simply, just 
looking at something with your eyes demonstrates this. Light waves that hit the eye’s retina will be 
translated electrochemically through neuronal pathways to the brain that will interpret the image at hand. 
The light waves themselves will carry a pattern that is different for a car than it is for a boat. Even if you 
saw the same object in a picture taken at the scene, the recognizability of the object (e.g. a car vs. a boat) 
would still depend upon that pattern distinction. Light waves from an object (or a picture of an object) 
serve as the data representing the information processed by the human mind when the sense of vision is 


We're Doing It All Wrong 


Page 16 of 30 



employed. Again, the inherent pattern implies a modeling of sorts. In this case, the light waves will be 
organized into a logical, orderly construct in order to correctly represent information on the source of those 
light waves. This orderliness is guaranteed by the very nature of the laws of physics enforced on the light 
waves. And the inherent logical data model must be adhered to when such light waves are perceived by the 
human mind. So the brain will not perceive a car as an automobile one day, then as a boat the next day. 
This pattern recognition is what is implied in the expression “I see it" when a person is asked if they caught 
a glimpse of some object. This form of pattern recognition is also inherent in the rendering of NASA space 
photos, especially when direct visual access is not possible. Sum it up and we have this: Visual 
information can be represented by organized data in visual format having an inherent logical data model 
which must be adhered to when processing such data. 

When data is in mathematical format, it blends well with computation because after all, computers only 
process data mathematically and this process is binary or digital in nature. So any data in mathematical but 
not digital format (e.g. complex formulas in quantum mechanics) would have to be translated into digital 
format before direct processing by the computer’s processor)s). An example of this is when software 
written in a computer language designed to handle such formulas (e.g. FORTRAN) gets compiled. The 
formulas themselves must have a pattern of organization because they are mathematical in nature and 
mathematics, after all, is the very science of patterns. Modeled as such, they are bound to have a logical, 
orderly construct by their very nature. Sum it up and we have this: Mathematical information can be 
represented by organized data in mathematical format having an inherent logical data model which must be 
adhered to when processing such data. 

Now we come to our most challenging of data formats for representing information, namely, natural 
language format. Why so challenging? Because data in natural language format (e.g. unstructured text) 
can be a representation of one of the most abstract forms of information, namely, ideas. A physical object 
(e.g. a duck) can be seen with the naked eye, heard by the ears, touched with the hand, smelled by the nose, 
and tasted by the palate (e.g. in a cooked state). Any of the five senses can be engaged for identifying the 
pattern of data that would represent information on this avian creature. The adage “if it looks like a duck, 
walks like a duck, and quacks like a duck, then it must be a duck" takes its queue from this pattern 
recognition principle. But what senses would you use to describe a totally abstract idea that defies even 
using mathematical formulas to do so? Probably just words. You may use sketches, diagrams, and other 
forms of visual aids but primarily, you will probably explain the details in words. Why? Because natural 
language is the main medium in which ideas are communicated to the human mind. 

The natural process of selecting the right words to describe an idea will focus first on the meaning of those 
words, then their grammatical construction. This we touched on in the example of the business letter in 
Chapter 2. As shown in that chapter, it is contents first, grammatical rules second, just like data first, code 
second in our data-centric approach. For instance, in putting together words to express thoughts and ideas, 
a person is not going to pick any word out of a hat just because that word happens to fit nicely in the 
grammatical structure of the sentence. The semantic value of the word to be picked will be a much more 
important issue. 

Once the chosen words are assembled together into a sentence, the sentence’s structure will obviously have 
an order to it, both in semantic terms (for the meaning) and syntactical terms (for the grammar). There will 
be a myriad of relationships existing between each word and group of words in the sentence, both at a 
semantic and syntactical level. The semantic level will reflect the idea itself, namely, the conceptual, 
abstract component of information on the idea, which in turn would be represented by the words 
themselves (the data). Now for the real challenge. 

Since natural language can be used to describe very abstract concepts, data in natural language format is 
ideally suited to represent information in very abstract, conceptual form. But recall that computers are as 
dumb as coffee tables. They can’t think, period. There is no cognitive reasoning. They do exactly as they 
are programmed to do. Feed them data in natural language format in the form of unstructured text and 
what do you have? The only way that the computer can process such data in any meaningful way (e.g. find 
a word in a document of free-form text, use a search engine to locate a document through keywords, etc.) is 
to do so mathematically. Therefore, a connection between the mathematical world of computers and the 
linguistic world of natural language has to be made. It is a huge divide or chasm to bridge. That is why 
text mining tools, search engine tools, and other “intelligent” software to date are very limited as to how far 
they can go in processing data like a five-year-old can when it comes to natural language. 


We're Doing It All Wrong 


Page 17 of 30 



It is hoped that by now the reader will begin to understand somewhat the second subtitle phrase for this 
chapter that states “data rules as king in the digital world but from a logical standpoint." This ruling 
principle is no more evident than when it comes to computers processing data in natural language format. 

Whenever we switch our PC on, a series of processes occurs that essentially does one thing - process data. 
The work done in I.T. shops used to be commonly referred to as “data processing.” It wasn’t called “code 
processing.” This fact in itself should give a hint as to what is the most important aspect of computing. 

But in the hodge podge of masses of data stored nowadays in electronic form, it’s like the saying “water, 
water everywhere but not a drop to drink.” We are drowning in data but starving for information. What’s 
the problem? The data is not organized well enough to represent information in a meaningful way. It’s as 
simple as that. But why is that so? 

It’s because the data, much of it stored away in unstructured text, and therefore in natural language format, 
can only be processed efficiently to the extent that its underlying logical data model is adhered to and by 
extension to the extent that the conceptual data model for the represented information is adhered to by the 
logical data model itself. That is why we say that data rules as king in the digital world of computers but 
from a logical standpoint. The physical data is powerless without the logical layer. The logical layer is 
what gives meaningful organization to the data. And what is that meaning? Why, it’s what the data 
represents, namely, information, the very thing that we’re starving for. That is why our age is called not the 
Data Age but the Information Age. 

Search engine firms are scrambling today to address this issue. Their common theme is that it’s all about 
relevance, and when you think about it, the true relevance of any storehouse of data is simply in the 
information that it represents. In fact, that’s why search engines are so popular; it is a natural process for 
humans to seek information. We’ve been doing it for thousands of years. Any success that search engine 
technology tools or text mining tools may have for the Internet, corporate Intranets, etc. will be dependent 
directly upon this principle (of data adhering to its logical data model), because of the very fact that data in 
natural language format (e.g. unstructured text) will be the most dependent (of all data formats) upon the 
relevant information that it represents. 

Retrieving such relevant information would be very easy if all the information in the world consisted only 
of the same two elements that make up the storage components of all digital data, namely, l’s and 0’s. You 
could then map everything on a one-to-one basis. But we know that is not possible. This book has been 
discussing at length about the divide that exists between information and the data that represents it, and that 
the key to bridging this gap is through proper data modeling. So if that’s the key to unlock all that 
information out there, how do we go about “making and turning the key”? This will be discussed in the 
next chapter. 


We're Doing It All Wrong 


Page 18 of 30 



6. Working With Information 

■ The key is in the data modeling by using complex pattern recognition 


The previous chapter emphasized that information is not as simple as l’s and 0’s like stored digital data is. 
Information can be complex, sometimes very complex, so complex that at times it seems impossible to put 
in words. By now, the reader should be able to understand the expression “to put in words’’ as really 
meaning “to select those words that would correctly represent the information to be conveyed.” We call 
this human procedure of communication fusing words to convey information in a meaningful way) 
“writing” or “speaking,” depending upon the medium of communication used. Such words will always 
have a pattern, and since that pattern will be for representing something that is more than likely complex, 
the pattern itself will often be complex also. It is in the recognition of such complex patterns that we have 
the means to “make the key” of data modeling. How we “turn the key” of data modeling to unlock the 
information that the data represents will be discussed in future chapters. But for now, we will first focus on 
how to “make the key” of data modeling through complex pattern recognition. 

Albert Einstein once stated, “Everything should be made as simple as possible, but not simpler.” That’s the 
whole idea behind quantum mechanics, the study of the infinitesimal: Break things down to their smallest 
component or quantum. If we did the same with data, we would have a “quantum data element” of sorts; if 
we did the same for information, we would have a “quantum information element” or “quantum meaning” 
of sorts. If we tapped into the relationships between the quantum data elements, we would have “quantum 
syntactical relationships” (in the case of data in the natural language format of words). If we then went 
further in tapping into the relationships between the quantum data elements and their representative 
quantum meanings, we would have “quantum semantic relationships.” To help the reader understand these 
concepts, and remembering that this book is, after all, about discussing software development and how to 
do it right, let’s put things in perspective by going back to the basics of data modeling. 

The simplest and easiest form of data modeling is when only whole numbers are involved for the type of 
data to model as we did in the beginning of chapter 1. So to review the point, let’s say you have a simple 
set of numbers 1, 2, 3, and 4 that are stored as structured data (e.g. the numbers are not stored in a sentence 
of unstructured text), and this data sits in some form of electronic data storage medium such as a relational 
database. It is not the purpose of this book to go into detail regarding relational theory or its use in 
relational database management, the system often used today for storing structured data in the business 
world. Rather, we will stick to the basic principles of data relationships as it reflects on the data-centric, 
logical layered approach to software development. It is hoped that you, the reader, will be able to get the 
point no matter what technical background you have. 

So let’s say our set of numbers 1, 2, 3, and 4 are stored in that sequence. In other words, if you were to 
retrieve that data from the database as is, the first “quantum data element” or smallest component would be 
a “1,” the next a “2,” then a “3,” then finally a “4.” From the manner in which it’s stored in series, we can 
have a composite number value of “1234” which could represent 1,234 chickens, $1,234.00, or serial 
number 1234 for a lawn mower. The data is the same but the meaning is different. So the “quantum 
meaning” or “quantum information element” will be different in each case. For instance, “1234” for 1,234 
customers means the number of customers buying a lawn mower; “1234” for $1,234.00 means the number 
of dollars of the selling price of the lawn mower; “1234” for model #1234 means the series of numbers 
making up the model number for the lawn mower. 

Now what happens if the same data (“1234”) was stored in three different places and represented each of 
the different kinds of meanings mentioned in the previous paragraph? Well, if the software accessed the 
data “1234” for chickens when it was looking for the serial number, the value would be kind of right in one 
way but wrong in another, right? The difference is in the meaning, the information the data represents. 
Represent the information correctly in all cases and you’ll have no problems with the software, whether it’s 
an internal reference for the code or external data that the code is processing. The key then is in the logical 
data model, is it not? 

Now if you wrote software code to handle data (e.g. “1234”) for each of the possible meanings that the data 
could represent, you’d end up with more code than data. But with a data model, you can write the code in 
such a way that the software anticipates the different meanings that data could possibly take, if any. It’s 
like a model or prototype for an automobile. When you’re asked the make and model of a car, the make 
tells you the manufacturer, but the model tells you what type of car it is based on a model of that car that 


We're Doing It All Wrong 


Page 19 of 30 



was, say, prototyped as such in the design studio. Once prototyped, a whole assembly line of cars could be 
produced based on that model. The car doesn’t have to be redesigned every time you produce one, even if 
the various features of the car may still differ one from another (e.g. exterior paint color). So likewise, the 
code doesn’t have to be rewritten every time the meaning of the data changes. In fact, it doesn’t even have 
to be rewritten every time the data itself changes if the data’s pattern still fits. That’s the power of data 
modeling. 

To illustrate, let’s take something a bit more complex than the straight “1234” structured data example. 
Let’s say that we have free-form, unstructured text stored in a set of online documents in a corporate 
Intranet. And among those documents, we have 20 different occurrences of the textual characters “1234” 
as they appear in text. Now in a typical search engine, the documents containing that character set would 
be displayed if the search criterion consisted of just those characters (”1234”). The potential difference in 
meaning becomes even more extensive when you consider that the environment in which those characters 
sit in is unstructured text. Therefore, the only way of knowing the meaning of each occurrence would be in 
their context, what linguists call “meaning in use” or “meaning in context.” vs. the abstract meaning 
inherent in the character set. 

On examining the 20 different occurrences more closely by viewing their context, we may find that there is 
a certain pattern to them. For instance, 5 occurrences may deal with the number of customers (1234) that 
have been sold lawn mowers but in the format “1,234” in text (notice the inserted comma). Another 10 
occurrences may indicate the price in dollars (1234) for the lawn mowers sold but in the format 
“$1,234.00” in text (notice the currency characters added). The other 5 occurrences may refer to the model 
number (1234) of the lawn mowers sold but in the format “#1234” in text (notice the pound sign added). 
The pattern in these cases can be deduced in part by the extra characters used. This complex pattern 
recognition can be used in data modeling, especially for natural language processing (of unstructured text). 

But what if there are exceptions to the patterns, “exceptions to the rule” as it were? Well, even exceptions 
in data can have patterns which, of course, can then be modeled. This will be discussed further in the next 
chapter. 


We're Doing It All Wrong 


Page 20 of 30 



7. The Power Of Parameters & Prototypes 

■ Even exceptions have patterns that can be parameterized 

■ When in doubt, prototype 


We have learned that complex pattern recognition helps tremendously when it comes to data modeling, 
especially the conceptual and logical aspects involved in unstructured data (e.g. in natural language 
processing). What, though, do we do with those many exceptions that can occur, especially in free-form, 
unstructured text or even in structured data for that matter? 

If we go back to the concept of data modeling, we touch on two parts to data, namely, the data elements 
and the data relationships. As implied in the title of this chapter, there’s a lot of power in parameterizing or 
describing data elements by using parameters. It is this power that we use to address exceptions. For 
instance, in our illustration from the previous chapter, say that we state a simple fact: “There were 1,234 
lawn mowers sold.” In this statement resides at least three data relationships: 1) the number of items - 
1,234, 2) the type of items - lawn mowers, and 3) the action on the items - they were sold. There is 
actually a fourth data relationship that can be derived from the third relationship, namely, that the selling 
occurred in the past and therefore was a completed action. In fact, the list could go on and on. For 
instance, you could say that less than 1,235 lawn mowers were sold or for that matter, more than 1,233 
lawn mowers were sold. This logic of relationships is what can be used to parameterize. 

For example, software code involved in natural language processing may be designed to recognize anything 
having to do with selling lawn mowers. If your logical data model makes provision for the quantity of 
lawn mowers that can be sold, you have a quantitative parameter that can be used in all sorts of scenarios 
such as “were more than 1,233 lawn mowers sold?” or “were less than 1,235 lawn mowers sold?” or “were 
any lawn mowers sold?” besides the standard question of "how many lawn mowers were sold?” 

Now we come to the issue of exceptions. Let’s say you have an exception to your factual data with such a 
statement as “There were many lawn mowers sold.” instead of “There were 1,234 lawn mowers sold.” The 
software code obviously could not detect the exact number of lawn mowers sold since it was not given. Of 
course, the context that the statement appeared in would have a great bearing on the meaning, but for the 
sake of simplicity, let’s assume that we’re talking about the same situation in context. There is still a 
pattern to the exception statement given. Do you see it? When we say that "there were many lawn mowers 
sold,” we are still touching on a quantitative parameter that must have a value greater than 1. That logic 
could be used to address such questions as “were any lawn mowers sold?” or “was there more than one 
lawn mower sold?” or even “were there at least two lawn mowers sold?” to which an answer of “Yes” 
could be given in each instance. 

It is not the purpose of this chapter to go into detail regarding entities, attributes, relationships, constraints 
and other components that can occur in a data model, but suffice it to say that with enough forethought, one 
can model data in a way that allows for addressing what it truly represents, namely, information. In fact, 
when all else fails (you can’t seem to model the data properly), sometimes the best approach is just to jump 
right in and prototype the process (e.g. write a snippet of software code using a rudimentary data model). 
Much can be garnered by seeing how information is handled through the processing of the data that it will 
represent. You can, in essence, reverse engineer the data model by looking at what would work and what 
would not work, especially when the data relationships get very abstract in their structure. Of course, once 
prototyped, you will want to unit test the process very thoroughly in order to get at the quantum data 
elements and the quantum meanings that they represent. 

Once broken down into their simplest constituent components, data modeled in such a way would be 
comprehensive and all-encompassing in their application to the SDLC. Such an undertaking would be 
strikingly different than the typically limiting data modeling done in a code-centric SDLC. It would 
produce an architecture that’s simple but powerful nonetheless. The complex part is getting there. It’s like 
a gigantic jigsaw puzzle of a thousand pieces that have to be all assembled in just the right pattern (there’s 
that “pattern” word again) but on completion can yield a simple but striking picture. How many times you 
have to arrange and re-arrange the puzzle pieces to get it just right gives you an idea of how many times 
you have to arrange and re-arrange the data elements and their relationships to get at the information- 
representing pattern behind them. But it’s not impossible. The next chapter will discuss how to go about 
the complex task of keeping your data modeling simple but powerful as you build on it for the software’s 
architecture. 


We're Doing It All Wrong 


Page 21 of 30 



8. Simple But Powerful 

■ The complex part is keeping it simple while making it powerful 


In building a data model from the ground up, it can become a greater and greater challenge to keep it 
simple while rendering its effect and usage more and more powerful. But it can be done. Remember our 
tools - parameterization, modeling even exceptions, prototyping where necessary, and always respecting 
the logical information flow that the data exists in and that the software code will execute in. As alluded to 
in the previous chapter, a data-centric, logical layered approach to software development via data modeling 
and the concomitant tasks associated with software architecture is a lot like assembling a giant jigsaw 
puzzle. Let’s go into further detail regarding the similarities. 

A jumbo jigsaw puzzle has many pieces (there can be many pieces to a software application or system, 
especially if global in reach or destined to become a legacy application or system). Every piece in the 
puzzle will have an exact fit in spatial logic (every component of the software architecture should fit 
logically to the information flow that will be worked with through the representative data). There will 
always be border pieces to the puzzle and these will be easier to work with (you need to border or scope the 
software development project so that the intent matches the need or want of the software’s end user without 
getting out of hand in “bells and whistles”; such scope can be easily determined by looking closely at the 
constraints inherent in the processes used in the information flow). Some non-border puzzle pieces will be 
easier to work with than others (in using pattern recognition, you will find that some data elements and 
their relationships are easier to note and model than others that are more subtle, elusive, or even hidden like 
layers in an onion that get exposed only after peeling the outer layers). 

Further similarities exist. The resulting picture will not be complete until the last puzzle piece is in, but 
then you don’t have any more pieces (only when the data modeling has been thoroughly performed and the 
SDLC steps are completed is the software ready for use, and its original intent should be fulfilled to the 
point that further changes aren’t necessary unless the end user “changes the picture”). If the manufacturer 
wants to modify the picture that the jigsaw puzzle would portray, the pieces used, reused, or replaced 
would have to still fit proportionately to the change (any changes to software due to desired or needed 
changes in the information flow would have to fit correctly in the overall software architecture and its 
associated data models). And while working with the puzzle pieces, you need to keep in mind the big 
picture, literally - that’s why the picture is displayed on the puzzle box for you (in data modeling, you need 
to keep the “big picture” in mind (e.g. the overall software architecture and the overall information flow) 
while drilling down to the small details). 

Yes, putting together a large jigsaw puzzle and developing software have a lot in common. In both cases, 
they’re “solving the puzzle.” In both cases, the goal (finish the big picture), the process (put the pieces 
together properly), and the components (the pieces) are simple. In both cases, it can be complex at times to 
solve the puzzle. In both cases, the result can be an eloquent picture. In both cases, it takes thought. Let’s 
look at some suggestions on how such thought can be carried out, especially when things get really 
complex. 

Using again the analogy of the giant jigsaw puzzle, suppose you have a non-border piece that has no 
variation in color and shade (e.g. for a clear, blue sky). About all you have to go on is the shape of the 
piece. You look for other pieces, perhaps of the same color and shade, that would have correspondingly 
opposite shapes so as to make a fit. Let’s see how this happens when you’re fitting in a “piece of the 
puzzle” in data modeling that involves, say, changing how you press the Send button in an e-mail 
application (e.g. from using a mouse to using shortcut keys). Let’s first consider how the original step was 
incorporated in information flow by starting at the conceptual data modeling stage. 

What will be involved in the information flow is a process step. For all intents and purposes, conceptual 
data modeling involves processes as well as data since a process is, in a way, a form of data - just with 
different properties that are often more dynamic than, say, structured data elements per se. A process-type 
data element will represent what transpires, for instance, dining a logical business transaction in delta time. 
In other words, it represents the actual flow of information components from one stage to another. So 
processes are a part and parcel of the information flow, and processes can be modeled since they can have 
patterns too (this is what is involved in, by the way, with BPM or business process management efforts). 
We could refer to such modeling as process modeling instead of data modeling. But for the sake of 


We're Doing It All Wrong 


Page 22 of 30 



universality, and since a process is a form of data, we will use the term “data modeling” to also include the 
modeling of processes. 

Now in our example, the applicable logical business transaction is found to be, by definition, the sale of a 
lawn mower to a customer. And we learn that in the context of the information flow, the customer is a 
business entity that doesn’t pay right away but has the sale charged to a business account and invoiced. We 
subsequently learn that one process step in that logical business transaction is the forwarding of a soft copy 
invoice to the customer. On further examination, we find that there are several layers of descendant 
processes involved (child processes, grandchild processes, etc.). Drilling down to the smallest logical step 
(representing the quantum meaning in our information flow of forwarding the invoice copy), we find the 
process step, namely, the sending of an e-mail with an attachment of the soft copy invoice. This addresses 
our logical data modeling. 

The physical data (or process) modeling would entail what actually takes place physically to go through 
this logical step. And that, we find, is the preparing of the e-mail in the e-mail application, the attaching of 
the soft copy invoice to the e-mail, then the use of the mouse to press the Send button to forward the e-mail 
to the customer’s e-mail address. We have peeled the onion to the innermost layers. We have looked at all 
the angles and shapes of our “puzzle piece.” Now to fit the piece into the grand scheme of things (changing 
how the Send button is to be pressed), all we have to do is reverse engineer the information flow (use 
shortcut keys to press the Send button at the above-described descendant process level as defined in the 
data model, and make sure the subsequent fit carries logically up the ladder of processes to encompass the 
entire information flow without disruption). 

In the data-centric, logical layered approach to the SDLC, if the above-described process (using shortcut 
keys to press the Send button) was to be automated in software, whatever code was to be used (no matter 
how fancy the algorithm) to implement the simulated keying in of such shortcut keys, it would have to 
respect the position of that descendant process level as defined in the data model. Otherwise, it won’t work 
properly, plain and simple. 

To help clarify for the reader the points made in the above e-mail process example, let’s take a more 
simplistic example. Picture a simple software program that is indeed powerful because it can perform a lot 
of mathematical calculations in a split second. Now our simple (but powerful) calculation program will 
have a hierarchy of instructions. One group of instructions may handle the input (e.g. what numbers to 
work with in calculating) and form one instruction set. Another set of instructions may handle the actual 
calculation processing. A third instruction set may handle the output, displaying the results in some format. 
And these three instruction sets are part of an overall instruction set making up the calculation program. 

So software programs are really just lists of instructions. But unlike standard jigsaw puzzle pieces, these 
instructions can be rearranged into infinite variations. It would be like having universal jigsaw puzzle 
pieces that can be rearranged into an endless number of different pictures. Now each instruction in our 
calculation program is really just a function with code behind it, using parameters that the programmer 
supplies. Rearranging or expanding such instructions will still permit the software program to run properly 
as long as those instructions are arranged in a way that will properly support the data and the flow of 
information that the data will represent. For instance, our instruction set for handling the actual calculation 
processing could perhaps be expanded into several instruction sets such as one for addition, one for 
subtraction, etc. Or the logic flow in the code could be rearranged so that the output feeds the input for a 
certain number of iterations. 

Most instructions in software programs can be categorized into four kinds of process arrangements 
(iterative, conditional, parsing, or data processing), and you can have one type of instruction embedded in 
another type of instruction. Iterative is when there’s a repeating of the same steps (e.g. “for” or 
“For.. .Next” statements in code). Conditional is for executing a step if a certain condition applies (e.g. “if’ 
or “If.. .Then.. .Else” statements in code). Parsing is when you take the step of extracting and/or 
reassembling a piece or pieces of data out of a stream of data (e.g. “strstr” or “Instr” statements in code). 
Data processing is for referencing data (e.g. a file) in order to something to it (e.g. move, copy, save, 
delete, etc.). 

From these four instruction types, you can have an infinite variety of parameters that the instructions would 
use, and an infinite variety of how they could be arranged. But let’s emphasize the point again regarding 
the data-centric, logical layered approach to the SDLC. No matter how complex the instruction sets are 


We're Doing It All Wrong 


Page 23 of 30 



arranged in your software program’s code, if you break down all those sets into their simplest components, 
namely, the instructions themselves, all those instructions and their interrelationships with each other have 
to respect their logical positions in the data model. Otherwise, your software program won’t work 
properly, plain and simple. 


We're Doing It All Wrong 


Page 24 of 30 



9. If It Can Be Conceptualized, It Can Be Digitized 

■ The digitization of data that represents information is just another medium for the expression of 
ideas and concepts 


Previous chapters have shown how a data-centric, logical layered approach to software development can be 
implemented. Since software runs in a binary or digital environment, a further examination of the digital 
nature of this environment (and the Uemendous potential inherent in that nature) is in order. 

Many schools of thought adopt the view that all kinds of information can be stored in digital format. We 
have discussed how this would be possible as long as the information contains enough patterns to be able to 
be properly represented through data in digital format. Certain pieces of the information, though, may 
never be storable in digital format such as the actual experience of an emotion or feeling. The experience 
may be describable but it won’t be understood by any machine, even if machines could emulate the 
expression of such an emotion such as through an android’s external, silicon-based, facial expression. 

Art has been doing that for centuries and effectively conveys information on emotions whenever a human 
viewer relates to its visual contents. But no matter how moving a masterpiece may be, it does not 
experience any emotions itself. So whether it is art, which is really data in visual format, or any other form 
of data, they simply act as a means or medium of communication of only information that is representable. 

That is why this chapter’s subtitle includes the phrase “the expression of ideas and concepts’’ instead of just 
“ideas and concepts.” There is a subtle difference here. An idea or concept is always abstract. It can 
reside in and be understood by the human mind, but you can’t pack it in your book bag. You may have a 
book in your book bag that describes the idea or concept, but then what you have in your bag is only data 
which happens to represent (accurately or inaccurately) information on the idea or concept. 

So when this chapter’s title says “if it can be conceptualized, it can be digitized,” we’re talking about the 
conceptualization of information to the point of having it represented through data in a logical order. We 
obtain that logical order when we derive patterns for data modeling from the information itself. If we can’t 
do so (such as in the case of trying to represent the experience of an emotion), we can’t digitize the data; if 
we can do so, we can digitize it. That is why the direction that the Information Age is moving in is more 
and more towards digitization of data. And if the data can be digitized, it can be worked with very 
efficiently by computing machines, which is why so many things are getting automated nowadays. 

In fact, the potential for automation is awesome. Consider digitizing data to the point of being able to 
process it so dynamically that you could constantly change your software application on the fly. This has 
been a dream of use case tool producers for years: Trigger a process that automatically produces software 
code. The only problem with that concept is that it’s still a code-centtic approach vs. the data-centric 
approach advocated in this book. 

But what if you could build a “front end” or user interface for inputting different elements (such as GUI 
web objects for data enUy or navigation) into your data modeling? It would be possible to even have the 
modeling itself greatly assisted through automation by having the system calculate the impact of new or 
changed elements in your software application. The rules inherent in the many data relationships still 
extant in the software architecture could be built into the process so as to maintain that code-data- 
information connection. The more encompassing the process, the more automated you could go, with the 
highly desired goal of automating your entire project management process, on the software architecture 
side at least. And at the press of a button, you could have the system produce a mock-up or prototype 
interface (e.g. a web page) to show the resultant change. What a boon to end user feedback and IAD 
sessions if the system could tell you virtually instantly what and how a new or changed idea or suggestion 
would impact your software application! That’s the inherent power in digitizing data. 

Remember the expressions “quantum data element” and “quantum meaning” used in Chapter 6 when 
discussing pieces of information flow and the representative data components being broken down into their 
smallest size. This breakdown is key to proper digitization because any data in digital format exists only as 
l’s and 0’s, nothing else. Every algorithm, every computing process, every data input or output pathway, 
and every storage device in the world of computers depends on how well that breakdown is carried out. 
Remember that computers can only "think” mathematically in l’s and 0’s, nothing else. And the gap 
between the abstract information side of things and those l’s and 0’s can be huge indeed but, as we have 


We're Doing It All Wrong 


Page 25 of 30 



seen, not impossible to bridge. The next chapter will discuss what the nature of such a bridge would be if 
it’s going to be workable and viable. 


We're Doing It All Wrong 


Page 26 of 30 



10. An Optimal Architecture For Digitally Processing Data 

■ Logical and geometrically Riemannian in structure 

■ Dynamic in nature 

■ Linear in results 

■ Spherical in conceptualization and functionality 


It has been argued throughout this book that a data-centric, logical layered approach to software 
development is the way to go if we are to bridge the potentially huge gap between the abstract information 
side of things and the only thing that computers process, namely, those l’s and 0’s in the digital world of 
computers. An optimal architecture for such an approach has to take into consideration the need and desire 
to digitally process data in a way that keeps it simple while making it powerful. The principles for such an 
architecture have been outlined throughout the book, so let’s now address what such a software architecture 
would look like. 

Its structure would have to be logical in nature (vs. physical) since you can’t place the components of 
software into some type of physical architecture as you can for hardware. And in a data-centric approach, 
your focus is on what will represent information, something totally abstract and conceptual. So this forces 
the need for logical relationships between the information and the data it represents. So by extension, good 
software architecture has to be logical too, since it has to provide the proper environment for the data to be 
processed correctly. And that logical structure should be very malleable too since it would be built on the 
underlying logic and order of things rather than just on lists of things. 

For example, hard coding something when writing software code addresses something very specific and by 
its very nature, deviates from the software architecture’s logical data model. Such code is like hardware, 
fixed in its position of operation. At times, hard coding may serve a purpose such as in addressing very 
specific exceptions to patterns of data in our data modeling, but it should be used sparingly if at all. We 
saw in Chapter 7 the power of parameterization when we tap into the modular nature of patterns. Let’s take 
advantage of that and keep the architecture logical in nature. 

We also say that the structure will be geometrically Riemannian. This expression sounds complicated like 
coming out of a mathematician’s thesis, but a brief explanation should clarify matters for the reader. High 
school geometry (squares, triangles, spheres, etc.) is the Euclidean type of geometry that deals with at most 
three dimensions. For instance, the formula for determining the volume of a sphere is 4/3 Ttr 3 where r is the 
radius of the sphere. If you tried to add another dimension to the sphere, you could do so mathematically 
but could not portray that extra dimension either on paper (two dimensional surface) or as a real-life 
specimen (three dimensional model). The geometry for that extra dimension (or any other dimensions past 
the three dimensions we measure in Euclidean geometry) carries different formulations than what you 
would have with Euclidean geometry and is referred to as Riemannian geometry. Such a multi¬ 
dimensional mathematical tool is often used in such disciplines as quantum mechanics (e.g. when studying 
string theory). 

Now when we are constructing our data modeling at the conceptual, logical, and physical stages, we are 
looking at relationships, many of them in fact. And these relationships can have relationships within 
relationships. To a certain extent, such relationships could be graphed with ruler and compass, using right 
lines and circles to represent their logical linkages. But with so many potential hierarchical layers of 
relationships, once we pass three layers, the mathematics of its geometry would become Riemannian in 
nature. Now whether Riemannian geometry is useful in, say, mathematical calculations of complex 
algorithms used in software code for natural language processing, time will tell but the possibilities are 
intriguing. 

The software architecture, to be optimal, would need to be dynamic in nature. In other words, its very 
essence has to be dynamic. A data-centric, logical layered approach would demand nothing less. If the 
data represents information that exists in information flow, flexibility becomes a key ingredient for 
software that is going to be used and reused again and again. Huge amounts of time get spent maintaining 
software after its initial release if this principle is not respected. Its exactly like shooting a movie. If your 
camera is on a fixed tripod and the scene keeps moving left or right, you’d have to keep changing the 
tripod’s position if you didn’t have a pivot on the tripod for just moving the camera alone. So the more 
flexible, the more dynamic your entire software architecture, the more movement you will have in its 
operation. 


We're Doing It All Wrong 


Page 27 of 30 



What kind of results would such an architecture yield then? Contrary to the structure and nature of our 
architecture, it would be strictly linear (straight line of resultant data) and here’s why. As already 
explained, computers directly process in l’s and 0’s only. They read data as l’s and 0’s, and they write 
data as l’s and 0’s. What gets produced, the tangible results whether written to memory or displayed on a 
computer screen, is an amalgam of l’s and 0’s. We can interpret the results as a picture (e.g. graphics on 
the computer screen based on pixel values derived from a series of l’s and 0’s) or a stream of words (e.g. a 
document displayed on the computer screen based on font type and size with character values derived from 
a series of l’s and 0’s). The raw results are always linear. Our interpretation of the results can take many 
forms depending upon the medium that the results are displayed in and their meaning, a sort of reverse 
engineering of the information flow: Information is represented by data which is processed by the 
computer as l’s and 0’s, and with results displayed as l’s and 0’s which form data that represents 
information. 

Here’s another interesting aspect of an optimal architecture for software. Its conceptualization (how its 
conceived) and by extension, its functionality (how it would function as a unit) would be spherical in 
nature. Why do we say spherical (three dimensions) when the structure is geometrically Riemannian (more 
than three dimensions supported)? Because we’re now talking about how the architecture would be put 
together in the mind and how the architecture would actually work. 

To illustrate, we can say that from a physics standpoint, the true structure of a car consists of an almost 
uncountable number of atoms that together make up a steel shell with paint, plastic, foam, rubber and other 
materials added in. But we say that the car functions or works by rolling along on wheels, being steered by 
some sort of steering mechanism. And the car was conceived in the mind of a vehicle designer as a sleek, 
aerodynamic ally sound automobile in three dimensions. So we have different types of properties for 
structure, function, and conceptualization. 

It’s the same with our software architecture. The structure would be logical and geometrically Riemannian 
as discussed. But what would be conceived in the mind in order to fulfill our data-centric, logical layered 
approach to the software’s development would end up descriptively as spherical. How the architecture 
would function could also be described as spherical. Let us see why. 

When thinking through how all the logical (and physical) pieces of the software architecture would come 
together, a good software architect will be keeping in mind the big picture at all times. As discussed in 
previous chapters, the details have to fit in with the overall architecture. But where are the pieces placed in 
the architect’s mind? They’re like stars in the sky, each with their own orbits or “processes” that intertwine 
where necessary with each other or around each other, so that there is one big order of things, universe, or 
cosmos. The Greek word for “cosmos” literally means “order.” When looking up into the sky and all 
around, we are looking outward from our central point of vision in a spherical manner. The software 
architect does the same conceptually, with the central point of vision being the core of the architecture 
itself. 

To engage the reader in another interesting exercise in abstract thought, we now consider the spherical 
aspect of the software architecture’s functionality. While we envisioned the conceptualization of the 
architecture as one, single sphere of associated elements, let’s visualize its functionality (in harmony with 
the dynamic nature of the architecture) as multiple, dynamically constructed spheres that appear and 
disappear during the architecture’s usage. 

Again to illustrate, suppose we consider the login functionality for a particular software application. 

Dining that specific login process, certain relationships come into play among the many components that 
would exist in the software architecture, both from a code standpoint and the data being processed by the 
code. For the moment, or in delta time to be exact, we will have this sphere of functionality that will exist, 
where the central point of the sphere is the function itself, and the outbranching links being all those 
associated elements and relationships that become directly or indirectly involved from both a physical and 
logical standpoint in our software architecture in order to fulfill the specified function (at the center). This 
spherical function construct could spawn other spheres, which is why we say that the spherical function 
constructs are easily multiple and being dynamically constructed (according to the particular function need 
or needs in delta time when the software is actually being used). 


We're Doing It All Wrong 


Page 28 of 30 



We can use another analogy for this multiple sphere concept when we look at how a computer’s operating 
system works. For instance, a UNIX operating system will have commands (coded instructions) that can 
be tied together using pipes (software connections between commands) to create a logic flow. When 
different pipes are used or when the pipes change their relationships (with commands and by extension, 
with each other), the “sphere” of pipe relationships emanating from the function of a system command 
(currently being executed) changes as the system command (to be executed) changes. It is somewhat like a 
grid of traffic (e.g. in a metropolitan urban area having many interlinking roads) but three-dimensional in 
scope. And as brought out in Chapter 8, these interlinking pieces of coded instructions (commands) can be 
rearranged in an endless number of ways as long as they support the underlying logical data model in the 
architecture of the software - in this case, the system software that runs the operating system. 

So let’s review what we would have if we want an optimal architecture to build really good quality 
software. The structure would have to be logical (in order to be flexible to the data and the flow of 
information that the data would represent). Its structure would be geometrically Riemannian as soon as you 
deal with more than three dimensions (or logical layers) in your logical data model. And by its very nature, 
the architecture would be dynamic due to its flexibility and malleability. The results would have to be 
linear as that is how computer systems work when producing output. When we try and conceptualize our 
optimal architecture model, the concept of a sphere fits best. This is because a sphere will have a central 
point that emanates outward to a spherical surface when the substance or space within the surface boundary 
contains equidistant relationships. In other words, each layer of substance (each layer or dimension in our 
logical data model) has the same distance hierarchically from the central point (conceptual core of the 
software architecture). And for the architecture’s functionality, the sphere model again works best except 
that we now have multiple spheres. From each functional sphere’s central point (the currently active 
function of the software during use) emanates equidistant layers of the logical data model’s data elements 
and relationships pertinent to the currently active function. Intriguing, no? 


We're Doing It All Wrong 


Page 29 of 30 



Epilogue 

■ Keep the software soft 


By now you, the reader, may feel that you have been taken on a journey of sorts, through the cliffs, valleys, 
peaks, and plains of the realms of information and data rather than just the realm of software code. But 
hopefully, this point will have stood out: You can’t produce good quality software without a proper respect 
for the data that it will process and for what that data represents - information. And that respect entails 
bending the software to fit the needs inherent in processing data due to the data’s role in representing 
information. 

That’s why we say, “keep the software soft.” The moment you “hard code” something, you introduce a 
rigid structure that may be necessary but is often not. Why? Because information doesn’t work that way. 

It flows like water; it doesn’t just sit there like a rock. But you may say, what about historical information 
that was “written in stone” since times past? Well, look closely at your historical information. Do not its 
relationships change as other information relating to it changes over time? Just like “no man’s an island,” 
so likewise, no piece of information is an island in isolation. 

By extension, any data used to represent information cannot be viewed as existing in total isolation. It has 
reactionary qualities just like chemical elements do. Change the context of the data enough and you may 
change what it represents. For instance, remember our “1234” data example? Its different contexts 
literally produced different representations of information fe.g. number of lawn mowers, number of dollars, 
and model number). So the data must flow with the information. 

Since software code is used to process data, the code itself also has to flow but with the data itself and in a 
way that aligns with how the data flows with the information that the data represents. So again, we say 
“keep the software soft” by using a data-centric, logical layered approach in its SDLC and your software 
will thank you back. 


About the Author 

At the time of this writing, George Kobak has been working as test automation architect for a Fortune 500 
company where he has been implementing and teaching many of the principles outlined in this book. 


We're Doing It All Wrong 


Page 30 of 30 



